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Introduction 


Classifying the sentiment of consumer text 
statements (whether from social listening, 
product reviews, survey open-ends, and other 
sources) has been an important research tool 
since the early days of digital analytics. Analyzing 
consumers’ sentiments toward brands and other 
entities has helped businesses with brand strategy, 
messaging strategies, customer experience, 

M&A, and more. 


However, the challenge has always been that 
automated sentiment classification—including 
the models built into the largest social listening 
tools on the market—has not been sufficiently 
accurate for most strategic decisions. Automated 
systems struggle with sentiment in context, 

with major tools like Brandwatch claiming only 
60%-75% accuracy, and we have found accuracy 
lower than that range in more complex categories. 
The alternative, human sentiment classification, 
is accurate and customizable to the context but 
overly time-consuming and expensive for most 
use cases. Fortunately, the emergence and rapid 
evolution of generative Al in large language 
models (LLMs)—like those behind ChatGPT, 
Gemini, and other Al interface tools—has 

created an opportunity to finally make 
meaningful progress toward research-grade 
sentiment models that can scale. 


Leveraging this new LLM technology, Kantar’s 

Dx Analytics practice has built a novel approach 
to sentiment classification that has dramatically 
improved accuracy compared to any of the other 
automated sentiment tools we have used over 
the past ten years. 
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The power of this new generation 
of Al models allows us to: 


— Separately judge the sentiment towards 
each brand/entity being discussed by 
asking the LLM to read the statement 
from each entity’s perspective. This adds 
significantly to classification accuracy for 
statements which compare brands, which 
are often left as “mixed” or “neutral” in 
standard automated sentiment tools. 


— Customize sentiment judgment to the 
brand context. This prevents context-specific 
interpretation errors, like when words such 
as “breakup” or “heartbreak” are coded as 
negative in conversations about brands like 
Ben & Jerry’s, when they should be positive. 


— Recognize indirect sentiment towards a given 
brand/entity through context. For example, 
this allows the model to know that sentiment 
towards “Cherry Garcia” is a reflection on 
Ben & Jerry’s without an analyst needing to 
maintain long flavor/SKU taxonomies 


In this article, we will share how we have 
designed this novel sentiment solution and 
the benefits this can provide for brands trying 
to understand customers’ concerns. 
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entity-pased 
sentiment 

classncation 


with LLMs 


To bring the power of LLMs into 
our sentiment pipelines, we use 
a dynamic prompting approach. 


Using API plugins built by our data engineering 
team, our analyst teams run each consumer 
statement through the LLM with a prompt 
that includes: 


A. Fine-tuned contextual guide explaining the 
key strategic issues the model should be aware 
of when judging sentiment in this context, 
B. Structural guide outlining the thought process 
to use and the format for the output, 
C. Which brand/entity to judge the sentiment 
for, and finally \ 
D. The full natural-language statement to read 
for sentiment (with light text cleaning to limit 
bias and noise) 


By cycling through each brand/entity in a set 
of competitors, we can separately understand 
the sentiment expressed in the statement 
towards each. 
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“Neither Honda nor Toyota has been making EVs for a long time. 
Toyota launched its first U.S. EV, which was facing reliability issues, 
while Honda has yet to introduce an EV in the U.S. market. Tesla and 
Rivian, the new ones, do great, and Kia/Hyundai has some pretty 
good EVs, albeit a little more plasticy.” 


ENTITY-SPECIFIC LLM PROMPT 


TOYOTA i 


© RIVIAN 


HONDA 
TE=5LA 
Sentiment: Sentiment: Sentiment: Sentiment: 
NEGATIVE NEGATIVE POSITIVE POSITIVE 


Figure 1: Example of entity-level prompt ingestion into the LLM engine 


In the figure above, the consumer statement 
shown was run through the LLM API four times, 
once for each of Honda, Toyota, Tesla, and 
Rivian. This resulted in four different sentiment 
classifications, with Honda and Toyota both 
being discussed negatively while Tesla and 
Rivian were discussed positively. 
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Note: We have built pipelines using the above 
approach at scale leveraging both GPT and 
Gemini, but we are importantly not attached to 
these models. As part of Kantar’s Dx team efforts 
to explore and implement cutting-edge Al models 
in our products, we are periodically testing a variety 
of LLMs, including Google’s models (Gemini and 
PaLM), OpenAl’s models (in the GPT family), and 
Meta’s models (LLaMa), to understand and apply 
the most efficient models in our processes. These 
foundational models have proven to be remarkably 
“plug-and-play” for many purposes, as the Q&A 
interface doesn’t require our developers to learn 

all new syntax to test new models. 


Came-cnanging 
accuracy with 
our Al sentiment 
engine 


platforms’ sentiment models. But that finding 
requires us first to answer the following questions: 


Importantly, we have found these LLM-powered ~ ] 
sentiment models to be significantly more A if 
accurate than other market-leading social listening 


within sentiment classification, what is “accuracy” 

i initi it? “ * 
in our definition, and how does Kantar measure it? pl 

Kantar’s Dx Analytics practice measures and 
optimizes the accuracy of our sentiment models 
by comparing the automated results with the 
results from expert human review and defines the 
accuracy of a sentiment model with a blend of 
three key metrics—the classification F1 score, 


the error in Net Sentiment Scores (NSS), and 
the degree of Sentiment Inversions. 


Classification F1 score 

As in traditional machine learning classification 
models, we use F1 to assess the accuracy of the 
model's observation-level classifications. This 
classification F1 score is a simple judgment of 
how often the model’s answer is identical to 
the human’s answer in our training dataset. 
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Net Sentiment Score error 

However, optimizing sentiment models just on 

Fl can result in very accurate models for positive 
statements and mediocre for negative statements, 
resulting in unbalanced error and skewed results. 
Kantar’s approach goes one step further. Beyond 
assessing the accuracy of the model's correct 
sentiment predictions, we also monitor the 
difference in Net Sentiment Scores (NSS) between 
the model and human-level analyses to ensure 
that the model accurately captures the sentiment 
expressed in the data. We define Net Sentiment as 
the degree to which positive mentions outweigh 
negative mentions in the overall dataset. Our 
models strive to limit the NSS error to within 

five percentage points. 


Sentiment inversions 

Finally, we also need to consider our relative 
concern about different error types. Most humans 
reviewing the same statements will find areas of 
disagreement on which mentions are “neutral” 

vs. having a clear sentiment; these errors in 
neutrality are less material than errors with 
opposite sentiment judgments. For this reason, 
we also consider the Sentiment Inversion score, 
which measures the degree to which the model 
labels true positives “negative” and true negatives 
“positive.” As this type of error is most significant 
for maintaining accurate insights, we require our 
models to have below two percent inversions. 


KANTAR'S Al SENTIMENT 
(POWERED BY PALM/GPT) 


Negative Neutral 
Negative 2211 217 
eee Neutral 465 1187 
Positive 102 204 


When evaluated together, these metrics ensure 
that the model performs well with imbalanced 
datasets and provides the most meaningful 
signals of consumer sentiment. An example 

of the incremental accuracy of this approach 

is shown below. 


This comparison shows the F1 score above, with 
an 84% F1 from our model vs. a 45% F1 from the 
market-leading social listening tool, a remarkable 
improvement in classification accuracy. The NSS 
error was -4pp in our solution vs. -17pp from the 
standard tool, and neither model had significant 
sentiment inversions. 


This comparison was performed using conversation 
about regulatory issues for large technology 
companies in the United States, a fairly nuanced 
topic area. To be fair to our platform partners, the 
incremental accuracy of this approach vs. standard 
automated sentiment will vary by topic area, 

with the largest differences occurring in nuanced 
topic areas with significant comparisons between 
brands/entities. However, many Kantar clients need 
sentiment analysis most when they are operating 
in complex contexts, and we have found in practice 
that these differences in accuracy are meaningful 
for strategic guidance. 


STANDARD PLATFORM'S SE 


NTIMENT 


Positive Negative Neutral Positive 
4) Negative 521 1890 58 
271 Neutral 179 1707 37 
2871 Positive 145 2868 164 


Kantar Al sentiment accuracy: 84% 


Kantar net-sentiment error: -4% 


Standard platform sentiment accuracy: 45% 


Standard platform net-sentiment error: -17% 


Sentiment accuracy: Calculated as the classification F1 score 
(harmonic mean of precision and recall). 


Net-sentiment error: Calculated as the percentage point error 


between the net sentiment (% pos - % neg) in the human markup 
and the model classification. 
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Figure 2: Confusion matrices comparing the accuracy of Kantar’s Al 
sentiment engine to the built-in automated sentiment in a leading 
social listening platform. 


Conclusion 
S& Next steps 


Kantar’s LLM-driven sentiment engine—alongside As we learn to build our analytical and brand 

our broader approaches to processes, mining, and strategy expertise into LLMs (and our broader 
interpreting text and images using Al models—is Al infrastructure), we will continue to roll out 

an exciting area of innovation for us. It allows us to new updates on deeper data mining approaches, 
bring novel insights and analytical approaches to brand positioning alerts, and additional analysis 
our clients. For instance, this fine-tuned sentiment layers for image and video content. 


engine is powering Kantar’s digital brand tracking 
solution BrandDigital, allowing that solution to track If you have any thoughts or would like to learn 


brand equity for massive brand sets continuously more about Kantar’s Al solutions and digital 
and providing deep-dive insights into the specific analytics capabilities, please contact us at 
products, attributes, and perceptual issues driving DxNALeads@kantar.com. 


sentiment in the category. 


Further, through developing and productizing 
these early GenAl features, Kantar's Analytics 

is deepening our practical knowledge of the 
inner workings and optimization strategies for 
this burgeoning technology. With this sentiment 
solution, our Al expertise has helped us solve 

a nagging social listening problem, but the 
innovation continues from here. 


We’d love to hear from you! 


About Kantar 

Kantar is the world’s leading marketing data and analytics business and an indispensable 
brand partner to the world’s top companies. We combine the most meaningful attitudinal 
and behavioural data with deep expertise and advanced analytics to uncover how people 
think and act. We help clients understand what has happened and why and how to shape 
the marketing strategies that shape their future. 
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